Issue 3456 Updated docs/designs/postgresql-state-store.md based on initial exper… #3457

gaffer01 · 2024-10-10T15:17:28Z

…imentation

Make sure you have checked all steps below.

Issue

My PR addresses the following issues and references them in the PR title. For example, "Issue 1234 - My Sleeper
PR"
- Resolves Update description of design of potential PostgreSQL state store #3456

Tests

My PR adds the following tests OR does not need testing for this extremely good reason:
- Documentation only

Documentation

[N/A] In case of new functionality, my PR adds documentation that describes how to use it, or I have linked to a
separate issue for that below.
[N/A] If I have added, removed, or updated any external dependencies used in the project, I have updated the
NOTICES file to reflect this.

…imentation

patchwork01 · 2024-10-11T07:31:35Z

docs/designs/postgresql-state-store.md

+item in DynamoDB, with optimistic concurrency control used to add new transactions. To avoid reading the entire
+history of transactions when querying or updating the state store, we periodically create snapshots of the state.
+To get an up-to-date view of the state store, the latest snapshot is queried and then updated by reading all subsequent
+transactions from the DyanmoDB transaction log. This state store enforces a sequential ordering to all the updates to


There's a typo at "DyanmoDB".

patchwork01 · 2024-10-11T07:37:56Z

docs/designs/postgresql-state-store.md

+file references and delete them when a compaction job finishes, we will lose track of what files are in the system
+and not be able to garbage collect them. We can avoid this problem as follows. When we add a file and some references,
+we also add a dummy file reference, i.e. one where the partiton id is "DUMMY". Normal operations on the state store
+ignore these entries. When all non-dummy references to a file have been removed, only the dummy reference will remain.


It looks like all we need from the dummy record is to know that that file exists. If we make a separate table for which files exist, we could avoid needing to set values for fields that don't have any meaning for this record. Is there any downside to having a separate table for that?

patchwork01 · 2024-10-11T07:42:59Z

docs/designs/postgresql-state-store.md

+tasks running at the same time and they all have a connection to the instance, that will put the instance under load
+even if those connections are idle. Each execution of a compaction job should create a PostgreSQLStateStore, do the
+checks it needs and then close the connection. We could add a method to the state store implementation that closes
+any internal connections or we could create the state store with a connection supplier than provides a connection


It looks like "than" should be "that".

patchwork01 · 2024-10-11T07:48:28Z

docs/designs/postgresql-state-store.md

+the idea of asynchronous commits from the transaction log state store, i.e. an SQS queue that triggers a lambda to
+perform the updates. However, in this case we do not want it to be a FIFO queue as we want to be able to make
+concurrent updates. We can set the maximum concurrency of the lambda to control the number of simultaneous updates to
+the state store.


The maximum concurrency would be shared between all Sleeper tables. If a larger number of tables were actively updated, we wouldn't necessarily get this effect of controlling the simultaneous updates to a state store, because we'd need to set it high enough for all the tables. It seems unlikely that would cause a problem though.

patchwork01 · 2024-10-11T07:49:43Z

docs/designs/postgresql-state-store.md

-
-This has some differences to the rest of Sleeper, which is designed to scale to zero by default. Aurora Serverless v2
-does not support scaling to zero. This means there would be some persistent costs unless we explicitly pause the Sleeper
-instance and stop the database entirely.


Can we keep this information about Aurora in the document?

patchwork01 · 2024-10-11T07:56:35Z

docs/designs/postgresql-state-store.md

-With higher levels of transaction isolation, you can produce the same behaviour as a conditional update in DynamoDB.
-If a conflicting update occurs at the same time, this will produce a serialization failure. This would require you to
-retry the update. There may be other solutions to this problem, but this may push us towards keeping transactions as
-small as possible.


It still seems possible this will be a problem. From the manual it sounds like there are cases where page locks are acquired even with serializable isolation level. This could still affect us even when we're certain we never update the same records in multiple places.

I can imagine myself coming back to look at this and not being able to find everything. It might be useful to keep a record of the logic behind this, at least why we use serializable isolation level, what we were worried about, and some detail of how it affects us. Can we bring some of this back?

Updated docs/designs/postgresql-state-store.md based on initial exper…

84cc9cb

…imentation

gaffer01 changed the title ~~Updated docs/designs/postgresql-state-store.md based on initial exper…~~ Issue-3456 Updated docs/designs/postgresql-state-store.md based on initial exper… Oct 10, 2024

gaffer01 changed the title ~~Issue-3456 Updated docs/designs/postgresql-state-store.md based on initial exper…~~ Issue 3456 Updated docs/designs/postgresql-state-store.md based on initial exper… Oct 10, 2024

patchwork01 reviewed Oct 11, 2024

View reviewed changes

patchwork01 assigned gaffer01 Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 3456 Updated docs/designs/postgresql-state-store.md based on initial exper… #3457

Issue 3456 Updated docs/designs/postgresql-state-store.md based on initial exper… #3457

gaffer01 commented Oct 10, 2024

patchwork01 Oct 11, 2024

patchwork01 Oct 11, 2024

patchwork01 Oct 11, 2024

patchwork01 Oct 11, 2024

patchwork01 Oct 11, 2024 •

edited

Loading

patchwork01 Oct 11, 2024

Issue 3456 Updated docs/designs/postgresql-state-store.md based on initial exper… #3457

Are you sure you want to change the base?

Issue 3456 Updated docs/designs/postgresql-state-store.md based on initial exper… #3457

Conversation

gaffer01 commented Oct 10, 2024

Issue

Tests

Documentation

patchwork01 Oct 11, 2024

Choose a reason for hiding this comment

patchwork01 Oct 11, 2024

Choose a reason for hiding this comment

patchwork01 Oct 11, 2024

Choose a reason for hiding this comment

patchwork01 Oct 11, 2024

Choose a reason for hiding this comment

patchwork01 Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

patchwork01 Oct 11, 2024

Choose a reason for hiding this comment

patchwork01 Oct 11, 2024 •

edited

Loading